Using data from the Seattle Public Library, I will observe trends in
Jane Austen’s book checkouts from three datasets: (1) a dataset of all
checkouts from 2022-2023 (all_checkouts), (2) a dataset of
items checked out at least 10 times a month from 2017-2023
(ten_checkouts), and (3) a dataset of items checked out at
least 5 times a month from 2013-2023 (five_checkouts). I
also plan on analyzing the checkout patterns in the average number of
checkouts (average_checkouts_per_Austen_book), checkout
data for specific books and its format-class during certain months
(most_checkouts_for_emma_digital,
most_checkouts_for_emma_physical), year with the least
number of checkouts for a book
(least_checkouts_for_lady_susan), and total Emma checkouts
from 2022 to 2023 (emma_print_checkouts). By looking at
checkout data for certain books, I hope to gain insight into the
popularity of Austen’s most famous works in relation to time.
When working with this data, I chose to analyze it by a specific author, Jane Austen. I picked Jane Austen because she is a very famous author, and since two of the dataframes used were based on books checked out at least 5 or 10 times a month, I wanted to choose a popular author so that there would be sufficient data to work with since using data from just 2022 and 2023 is quite limited. The areas I wanted to analyze were things like the average checkouts per book to determine her most popular books, the month with the most “Emma” ebook checkouts, the month with the most “Emma” print book checkouts, the Checkout Year for the least checkouts for the book “Lady Susan,” and the number of total Emma checkouts from 2022 to 2023.
When finding these values, I obtained some interesting results for
the month with the most “Emma print book” and the Checkout Year. When
analyzing the most popular “Emma print book” checkout month, I received
a value of 0, but this was because there were no physical copies
available - only digital. For digital, the most popular months were
January and August. For the Checkout Year for the least checkouts for
the book “Lady Susan,” I had originally used
(all_checkouts), but changed it to
(five_checkouts) because it only showed 2022, as there were
only two possible values originally. After changing it, I got 2016,
2017, and 2022.
For average checkouts per book, Emma was the most popular at a little over 39 checkouts, and Pride and Prejudice was the second most popular at around 37 checkouts. The least checked-out book was Raison et Sensibilité, which is the French version of her book Sense and Sensibility (the fourth most checked-out book). It is likely last because it is in French, and Seattle may not have a lot of people who prefer reading in French. I also analyzed the number of total Emma checkouts from 2022 to 2023 and obtained a total of 292.
This data was published by the Seattle Public Library.
The parameters of the data are UsageClass, CheckoutType, MaterialType, CheckoutYear, CheckoutMonth, Checkouts, Title, ISBN, Creator, Subjects, Publisher, and PublicationYear.
The data was collected by the Seattle Public Library storing data each time a person checks out a book. The data is usually updated monthly (with the last update being February 6, 2023).
The data was likely collected for inventory purposes, so they can order more popular books or shorten checkout durations for popular books during historically popular checkout months.
Since the Seattle Public Library is a government entity, there can be privacy concerns over the government having access to the reading trends of the public.
There are some possible limitations to consider when examining this
data, such as the time period it covers. As the data goes back to 2005,
the original file was huge and quite hard to work with all at once.
Therefore, each of the data frames used was condensed to checkouts made
only between 2022-2023 (all), 2017-2023 (ten checkouts), or 2013-2023
(five checkouts), which cover a span of just a few years. Since I am
working with data from a specific author, it makes looking at yearly
data difficult. For example, in my
(least_checkouts_for_lady_susan), I had originally used the
(all_checkouts) data frame, but changed it to the
(five_checkouts) data frame because the value just showed
2022 originally. Since there were only two possible years in
(all_checkouts), it doesn’t create unique/interesting
analysis. Additionally, since the amount of data collected from each
borrower is limited (due to ethical reasons), the scope of the data is
limited and may not show some other interesting details. Some other
challenges with this data are the risks of errors in the data. A few
examples of this include unreturned books causing a decline in the
number of checkouts recorded, books being checked out multiple times by
the same person, or technical errors. Another limitation is that the
number of copies available in the library is not shared. With this
information, we would be able to further analyze the number of checkouts
because it could be due to limited copies.
This time series chart shows the total checkouts for Jane Austen books, categorized by title. The chart reveals that Pride and Prejudice was the most popular book by Austen in 2022, with September 2022 and January 2023 being the most popular months, with 239 checkouts. May 2022, with 132 checkouts, was the least popular month for Pride and Prejudice. The reasons for the spike may include people having more leisure time to read during holiday months, an increase in reading activities in January due to reading goals, or the book being commonly used as required reading in many US English classes.
This time series chart shows the Jane Austen book checkout count based on material type (e.g., book, song, movie, music, magazine). The chart reveals that the audiobook version has become one of the most popular forms of Austen’s book consumption over time. There was a spike at the end of 2018 and early 2019, and it kept increasing until there was a sudden drop in June of 2020. It eventually picked back up in mid-2021 and has continued on an upward trend, with a few occasional dips. The ebook versions of her books have also become extremely popular, starting at the beginning of the pandemic. The physical book has been becoming less popular as it has been on a downward trend since 2013.
This stacked bar chart compares the number of checkouts from the
books “Pride and Prejudiced” and “Pride and Prejudice (unabridged)”.
Since there are many different versions of the same book, I wanted to
observe the difference in checkouts between each book to see if people
preferred the full-unabridged text over the other. Given that this book
is a classic, it is understandable that people interested in literature
would want to read the unabridged version while those who are not would
want to read the regular version. However, due to a limitation in the
data that does not share the total number of copies available, it is not
possible to determine if there are more checkouts due to there being
more regular copies of the book or if it is because more people prefer
the regular versions. Regarding the relationship to the checkout month,
the unabridged version tends to have a consistent checkout count, while
the regular version shows a little more fluctuation. Since there are
more checkouts in January, there could be fewer copies available for
people to check out in February, which could explain the dip in
checkouts during that month. Something that also interested me was that
in (average_checkouts_per_Austen_book), the unabridged
version of most books tended to be more popular.